Lecture � CS187 Computational linguistics

Greg Detre

14:30 on Friday, September 20, 2002

Prof. Stuart Shieber,MD G-125

 

computational linguistics � study of human language using tools + techniques of compsci

 

applications

transcribing, categorising, translating, controlling, dividing, restoring/editing, annotating, changing

 

speech recognition, e.g. 1-800-COLLECT � do you accept this call? listens just for �yes�

 

Descartes, letter to Marquess of Newcastle (1646???) � flexibility of verbal behaviour as the criterion of a �rational soul�

 

considers the Bladerunner �Turing test� for humans vs replicants of iris dilation � a property completely independent of intelligence

 

the problem of computational linguistics is �AI-complete� (by analogy with NP-complete)

 

CL vs NLP

CL � scientific side

NLP � engineering side

 

Shieber�s areas of interest concern mainly techniques rather than applications

both engineering + scientific

 

what�s a natural language?

artificial language, e.g. C++, or Esperanto, Hebrew, also Korean alphabet

 

well-formedness

how useful is grammaticality to NLP applications?

people don�t really speak grammatically for the most part

although different dialects have subtly different grammars

people do follow certain rules when they form ill-formed sentences

there are degrees of well-formedness

epsilon well-formed � distance measure of how well-formed

would a system that could distinguish whether or not something is well-formed benefit?

maybe use a probabilistic method in assessing well-formedness

but this is still assuming a binary distinction of well- or ill-formed

can we actually distinguish well-formedness from meaningfulness?

maybe well-formedness is secondary to grammaticality (e.g. Schank)

 

Hockett�s theory (in �A manual of phonology�, 1955)

he wants to give an algorithmic model of human speech

pg 4, section 0211-0213

proposes the possibility of a talking dog � �let me add at once that I am entirely serious�

how good is his theory?

it characterises only but not all the well-formed English strings over that vocabulary

you could extend it to all the well-formed strings if you made it huge (because it doesn�t really recurse)

 

What kinds of languages are describable in general by Hockett�s method?

allows for infinite loops

is it a finite state machine or a context-free grammar?

if you could generate anbn, then you could show that it�s regular??? what�s �regular�???

all you know is what state you�re in, and there�s only a finite number of those � you don�t have a stack, and you don�t have a memory or a count

this is a finite state model, and this is the transition matrix

Is English (viewed as a set of strings) a regular language?

no?

Chomsky invented the Chomsky hierarchy (type 0 � 3) to try to prove that English (and all natural language) isn�t a regular language

wwR

parentheses

anbn (e.g. anti- anti-missile missile missile � but greater than about 2 levels, it ceases to be meaningful English)

relative clauses

the cat likes tuna fish

the cat the dog chased likes tuna fish

the cat the dog the rat bit chased likes tuna fish

maybe what the English language is is anchored to the cognitive limitations of its speakers

if you want to limit English sentences to less than 1 million words, then English contains a finite number of strings, and is therefore regular

but consider C++�???

what about the practical limit of the length of our lives?

you just have to make idealisations (just as in physics)

but the difference here may be that it�s not at all obvious that (unlike the classical physical laws of the universe) there is a determinate answer as to whether a given sentence is grammatical within the English language

how do you prove that English is/not regular?

to prove that English is not regular is hard, because there can be non-regular subsets of a regular language

apply a pumping lemma directly???

closed

apply the homomorphism that turns all nouns to a and all verbs to b (in the cat/dog example above)

the argument is that you end up anbn

intersect English with a regular language � remember that regular languages are closed under intersection with other regular languages

 

The Chomsky hierarchy

type 0: recursively enumerable

type 1: context-sensitive

type 2: context-free

type 3: regular

invented this to show that Hockett�s theory is inadequate

 

Are natural languages context-free?

Chomsky � context-free grammars are inappropriate for paralleling natural language

1981 � Gasdar + Pullam(sp???) � claimed that every previous proof that natural language is not context-free was wrong (for empirical or theoretical reasons)

1986 � the issue was closed, because Gasdar + Pullam �???

to show non-regularity, we showed dependency

to show non-context-freeness what pattern of dependency would you have to show?

what did Toad bake ___?

who met John ___? whom did John meet ___?

Norweigan may allow multiple questions

if you have finitely-many dependencies, you can encode them � but if you have arbitrarily many �

if you have cross-serial dependencies (i.e. that aren�t nested within each other, but cross over each other)�

e.g. John, Mary and Bill bought a Ford, a Porsche and a Pontiac respectively

or better: John, Mary and Bill killed himself, herself and himself respectively

 

if one particular natural language is shown not to be CF, then (if you assume that there is something universal about natural language) chances are that you�ve shown something about natural language in general

 

Admin

Enrollment

limited to 33 � needs to be 20

only 13 are deadly serious

due in by Mon morning, returned by Mon eve

Structure of the course

first half class/lecture (lots of discussion), second half project

 

Questions

Star Trek � The ultimate computer � computer has control of the Enterprise, but Kirk convinces the computer to relinquish control

is Esperanto as messy as other �real� natural languages???

apparently yes, like Hebrew

recur (means again, as in recursive) rather than recurse (curse again)